Skip to content

5370 sync upstream sql parser#5372

Merged
aleks-f merged 38 commits into
mainfrom
5370-sync-upstream-sql-parser
May 29, 2026
Merged

5370 sync upstream sql parser#5372
aleks-f merged 38 commits into
mainfrom
5370-sync-upstream-sql-parser

Conversation

@aleks-f
Copy link
Copy Markdown
Member

@aleks-f aleks-f commented May 27, 2026

No description provided.

aleks-f added 8 commits May 27, 2026 12:36
Re-import the entire vendored sql-parser tree from upstream
hyrise/sql-parser at commit ccd3f68. Brings in 13 upstream commits
since the previous baseline c247124:

- 9ca19ce Correctly parse named columns for joins
- 1969a3c Allow more expressions for selected statements
- 57c763a Make COPY statement options compliant to Postgres
- 87e6eac Add FOREIGN KEY constraints to CREATE TABLE
- 0663319 Support NULLS FIRST and NULLS LAST in ORDER BY
- 3456f97 CSV import options for DELIMITER, NULL and QUOTE
- ccd3f68 Add schema to functions
- 139375e Include cstdint for GCC-13
- 0ede821 Add some newlines at the end of files
- plus four CI-only commits (no effect on vendored code)

Adds new file src/sql/ImportExportOptions.h. Drops the local
sqlparser_win.h header and all Poco-side patches (export macro,
SQLParserResult pointerization, MSVC pragmas, Makefile tweaks),
which are re-applied in the follow-up commit.
Restores Poco-side modifications dropped by the upstream sync:

- src/sqlparser_win.h: Windows DLL export macros (SQLParser_API
  driven by Data_API when included from Poco, or SQLParser_EXPORTS
  standalone), _WIN32 strncasecmp/strcasecmp aliases, GCC visibility
  fallback.
- SQLParser_API annotation on every public class/struct/free function
  in src/SQLParser.h, src/SQLParserResult.h, all of src/sql/*.h
  (including new upstream types CsvOptions, ImportExportOptions,
  ReferencesSpecification, ForeignKeyConstraint, ColumnConstraints),
  and src/util/sqlhelper.h.
- sqlparser_win.h includes in SQLParserResult.h, sql/ColumnType.h,
  sql/SQLStatement.h, util/sqlhelper.h.
- SQLParser.h: [[deprecated(Use parse())]] attributes on the legacy
  parseSQLString overloads.
- SQLParserResult.{h,cpp}: pointerize statements_ and parameters_ as
  pointer-to-vector with mutable lazy allocation; reset(bool mv)
  supports move-construct case; setErrorDetails frees previous
  errorMsg_ to plug a leak. Upstream did not touch this file in the
  imported range so the patch applied cleanly.
- src/parser/bison_parser.cpp: MSVC pragma (disable: 4996, 4267)
  around generated code.
- src/parser/flex_lexer.{cpp,h}: MSVC pragma around generated code,
  Windows io.h replacement for unistd.h, __has_include(<stdint.h>)
  guard on flex INT*_MIN/MAX fallbacks.
- src/sql/Expr.cpp: MSVC pragma around strncpy use in substr().
- Makefile: add -I. -Isrc/ to LIB_CFLAGS and TEST_CFLAGS, normalize
  TEST_CFLAGS to -std=c++17.

Verified: standalone make + make test pass (sql_tests success, no
leaks). Poco Data shared library (debug and release) builds and
links cleanly.
…er.h #5370

Restored fork-local removal of #include sql/statements.h from
SQLParser.h. The include was originally pulled in transitively by
SQLParser.h, but it dragged the full per-statement type set into
every translation unit that included SQLParser.h, which created
ambiguity issues elsewhere in the Poco tree. Consumers that need
the concrete statement types include sql/statements.h directly.
SQLParserResult.h still pulls in SQLStatement.h, which is enough
for Poco::Data::Statement which only touches SQLStatement::type().
…5370

Adds a Linux job that builds the vendored sql-parser sources with
its own Makefile and runs the upstream test suite. Closes the
coverage gap where Poco CI compiles SQLParser into libPocoData but
only exercises it indirectly through Poco::Data::Statement smoke
tests in DataTest. The new job actually parses queries-good.sql and
queries-bad.sql, runs valgrind for leaks, and re-runs bison to
check the grammar is conflict-free.

Triggered by the new sqlparser path filter (Data/SQLParser only),
so it does not run on every Data change. bison, flex, and valgrind
are installed via apt; ubuntu-24.04 ships versions well above the
3.0 / 2.6 minimums the Makefile requires.
…5370

Extends the vendored hyrise sql-parser to recognize two placeholder
syntaxes in addition to the existing standard ?:

- $N (postgresql 1-based explicit numbering, e.g. WHERE a = $1)
- :name (sqlite named, e.g. WHERE a = :user)

Implementation:

- New lexer rules for the two patterns, placed before the punctuation
  catch-all. $ was previously unused. : remained a punctuation token
  but had no grammar references, so the longer :identifier match wins
  without conflict.
- New bison tokens DOLLAR_PARAM (carries the int N) and NAMED_PARAM
  (carries the identifier). param_expr now accepts all three forms.
  $0 is rejected via YYERROR at parse time.
- Three ExprType values: kExprParameter for ?, kExprParameterDollar
  for $N (ival = N), kExprParameterNamed for :name (name = identifier).
  This keeps consumer switch statements explicit instead of overloading
  a single enum with discriminator fields.
- Factory functions Expr::makeDollarParameter and Expr::makeNamedParameter.
  isLiteral returns true for all three. The destructor already frees
  name so no further changes are needed.
- The top-level input rule only renumbers ? placeholders sequentially;
  $N keeps its explicit ival, :name keeps its name. addParameter still
  sorts by ival, which now orders $N by explicit position and clusters
  named placeholders.
- sqlhelper print case extended to handle the two new ExprType values.
- sql_asserts.h now pulls sql/statements.h, restoring the include that
  prepare_tests.cpp and friends used to get transitively through
  SQLParser.h before the earlier sync removed it.

Verified: standalone make + make test pass cleanly. All three checks
green (sql tests including three new placeholder cases, valgrind, and
bison grammar conflict). Poco Data lib (debug and release shared)
builds and links cleanly via the poco_env.bash flow.
Imports Poco::Data::Utility from macchina.io:

- Utility::renderValue<T> renders a single C++ value as a SQL literal.
  Handles nullptr, Poco::Nullable, std::optional, bool, integral and
  floating types, Date/Time/DateTime/LocalDateTime, BLOB (X'...' hex),
  CLOB, Poco::Dynamic::Var (recursive), and anything string-view-like
  (single-quoted with internal ' doubled). Unknown types fall back to
  ? so the rendered text stays valid-shaped.

- Utility::boundSQL<...> renders a parameterised SQL template by
  substituting placeholders with stringified bindings. Currently a
  hand-rolled state-machine scanner supporting ? and $N styles with
  literal-awareness. (Switch to parser-backed substitution in a
  follow-up commit.)

- Utility::executeSQL<...> wraps Statement execution, captures stmt,
  bindings, timing, rows-affected/error and reports the outcome via a
  structured ExecResult callback.

- Utility::toJSON serialises an ExecResult as a JSON object with RFC
  8259 string escaping.

Adds 22 UtilityTest cases under Data/SQLite/testsuite covering
renderValue dispatch, boundSQL placeholder handling and arity errors,
executeSQL success/constraint/malformed/null paths, and toJSON
formatting. The testsuite passes against the in-tree hand-rolled
scanner; the follow-up parser refactor must keep it green.
The placeholder param_expr actions now set ival2 to the source column
(0-based) of the placeholder's first character. For ? the column is
yylloc.total_column - 1, for $N total_column - (1 + digits of N), for
:name total_column - 1 - strlen(name).

ival2 was previously written once for ? (yyloc.param_list.size(), which
duplicated the sequential id later stored in ival by the top-level
renumber loop) and never read anywhere in Poco. Repurposing it for the
source column gives all three placeholder kinds a consistent meaning:
ival -> id (sequential for ?, explicit N for $N, 0 for :name);
ival2 -> source column.

This lets consumers walk parameters() sorted by ival2 and substitute
placeholders against the original SQL text without having to re-scan
it themselves.

prepare_tests.cpp gains ival2 assertions on the four placeholder cases
(? / $N in-order / $N out-of-order / :name) validating the computed
column positions match hand-counted offsets.

The top-level input rule's renumber loop is unchanged - it only writes
ival for kExprParameter, leaving ival2 alone.

Verified: standalone make test passes all three checks.
…y::boundSQL #5371

Utility::boundSQL now parses the input SQL with hsql::SQLParser,
walks result.parameters() sorted by Expr::ival2 (source column),
and substitutes by position. Replaces the hand-rolled state-machine
scanner that duplicated lexer responsibilities (literal-awareness,
$N parsing, mixing detection). The parser handles single-quoted
literals natively, so placeholders inside strings are inert.

- ? and $N substitute as before; arity and mixing errors throw
  Poco::InvalidArgumentException with the same call sites/messages.
- :name placeholders are now pass-through (left verbatim in the
  output). The renderer does not bind by name; future Statement-level
  work can complete that.
- Unparseable SQL throws InvalidArgumentException; executeSQL
  catches and falls back to the raw template string, matching the
  pre-existing contract that callbacks always get a populated sql.
- The POCO_DATA_NO_SQL_PARSER build retains the original hand-rolled
  scanner as a fallback so the no-parser CI matrix keeps working.

Also lifts the upstream parser's ban on ? in extended_literal
contexts (INSERT VALUES, EXECUTE args). The rejection at
bison_parser.y:1098 ("Parameter ? is not a valid literal.") was
inconsistent with the newer $N / :name placeholders, which the same
rule accepts. Common INSERT INTO t VALUES(?, ?, ?) patterns now
parse natively. Two queries-bad.sql cases that asserted the
rejection move to queries-good.sql.

UtilityTest SQL strings updated to wrap VALUES(...) fragments inside
INSERT INTO t — the parser-based path validates structure, so bare
VALUES() fragments are no longer accepted (they were a quirk of the
text-only scanner). testExecuteSQLMalformed adjusted to assert the
fall-through behavior (raw sql preserved when boundSQL throws).

Verified: SQLite testsuite 122 tests pass (SQLiteTest 100 +
UtilityTest 22). Standalone SQLParser make test passes all three
checks. flex_lexer.{cpp,h} were regenerated but the byte-identical
output to HEAD means they are not in this commit.
The previous commit included SQLParser.h directly in Utility.h so the
templated boundSQL body could reach hsql types. That worked for the
make build (the SQLite testsuite Makefile adds the SQLParser include
paths via target_includes), but the cmake build keeps SQLParser as
PRIVATE on the Data target, so any consumer of Utility.h (such as
DataSQLite-testrunner) failed with:

  Utility.h:31:10: fatal error: SQLParser.h: No such file or directory

Split boundSQL into a thin template that only renders the binding
values plus a non-template back-end boundSQLImpl that lives in
Utility.cpp and owns the parser-based substitution. The public
header now needs no SQLParser headers at all; the parser dependency
is purely internal to PocoData.

Verified: cmake build with parser-enabled and parser-disabled
matrices both link DataSQLite-testrunner; the 22 UtilityTest cases
pass under the cmake build.
@aleks-f aleks-f marked this pull request as draft May 27, 2026 13:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR syncs the vendored Data/SQLParser with upstream changes (plus Poco-specific extensions) and adds a new Poco::Data::Utility helper for rendering/binding SQL for diagnostics, structured execution reporting, and JSON serialization, along with new SQLite tests and CI coverage for the standalone SQLParser suite.

Changes:

  • Updated SQL parser grammar/AST to support new syntax (e.g., FOREIGN KEY constraints, JOIN ... USING columns list, ORDER BY NULLS FIRST/LAST, schema-qualified functions, COPY ... WITH (...) options, $N and :name placeholders).
  • Added Poco::Data::Utility (renderValue/boundSQL/executeSQL/toJSON) plus SQLite tests covering the new behavior.
  • Extended GitHub Actions to run upstream sql-parser build+tests (including valgrind + grammar conflict checks) when SQLParser files change.

Reviewed changes

Copilot reviewed 37 out of 56 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
Data/src/Utility.cpp Implements SQL literal rendering + ExecResult JSON serialization helpers.
Data/include/Poco/Data/Utility.h Adds Poco::Data::Utility API (renderValue/boundSQL/executeSQL/toJSON).
Data/Makefile Adds Utility object to Data library build.
Data/SQLParser/test/tpc_h_tests.cpp Formatting adjustments for TPCH grammar tests.
Data/SQLParser/test/thirdparty/microtest/microtest.h Macro refactor/formatting for the test framework.
Data/SQLParser/test/test.sh Refactors test script result handling and summary logic.
Data/SQLParser/test/sql_tests.cpp Updates/extends parser unit tests (FKs, COPY options, intervals, etc.).
Data/SQLParser/test/sql_parser.cpp Formatting-only changes in tokenization tests.
Data/SQLParser/test/sql_asserts.h Updates test helper macros / includes.
Data/SQLParser/test/select_tests.cpp Extends SELECT tests (NULL ordering, JOIN USING, schema-qualified functions).
Data/SQLParser/test/queries/queries-good.sql Adds new “good” grammar cases for new features.
Data/SQLParser/test/queries/queries-bad.sql Adds new “bad” grammar cases (negative tests) for new features.
Data/SQLParser/test/prepare_tests.cpp Adds $N and :name placeholder tests + column position tracking.
Data/SQLParser/test/auto_query_file_test.cpp Formatting-only changes to the query-file-driven grammar test.
Data/SQLParser/src/util/sqlhelper.cpp Extends AST printer to handle new Expr/placeholder types.
Data/SQLParser/src/SQLParserResult.h Minor formatting cleanup.
Data/SQLParser/src/SQLParserResult.cpp Formatting cleanup in reset/releaseStatements.
Data/SQLParser/src/SQLParser.cpp Formatting cleanup in parse/tokenize.
Data/SQLParser/src/sql/Table.h Adds JoinDefinition::namedColumns to represent JOIN ... USING columns.
Data/SQLParser/src/sql/statements.cpp Updates statement implementations for new fields/options and cleanup.
Data/SQLParser/src/sql/SQLStatement.h Minor formatting cleanup.
Data/SQLParser/src/sql/SQLStatement.cpp Formatting cleanup in hints destruction.
Data/SQLParser/src/sql/ShowStatement.h Adds ShowStatement header (previously implicit).
Data/SQLParser/src/sql/SelectStatement.h Adds NULL ordering support to OrderDescription.
Data/SQLParser/src/sql/InsertStatement.h Adds InsertStatement header (previously implicit).
Data/SQLParser/src/sql/ImportStatement.h Refactors import/export option types into shared options + adds fields.
Data/SQLParser/src/sql/ImportExportOptions.h Introduces shared Import/Export options + CSV options.
Data/SQLParser/src/sql/Expr.h Adds new placeholder expr types + schema-qualified function support.
Data/SQLParser/src/sql/Expr.cpp Implements new Expr constructors + schema field + placeholder handling.
Data/SQLParser/src/sql/ExportStatement.h Refactors to use shared import/export options + adds fields.
Data/SQLParser/src/sql/DropStatement.h Minor formatting cleanup.
Data/SQLParser/src/sql/DeleteStatement.h Adds DeleteStatement header (previously implicit).
Data/SQLParser/src/sql/CreateStatement.h Adds FK constraint types + references specs + column constraints wrapper.
Data/SQLParser/src/sql/CreateStatement.cpp Implements FK/reference plumbing + moves Create-related impls here.
Data/SQLParser/src/sql/ColumnType.h Formatting + include fixes.
Data/SQLParser/src/sql/AlterStatement.h Adds AlterStatement header (previously implicit).
Data/SQLParser/src/parser/parser_typedef.h Minor formatting cleanup.
Data/SQLParser/src/parser/flex_lexer.l Adds lexer tokens for ENCODING/FOREIGN/REFERENCES and $N/:name.
Data/SQLParser/src/parser/flex_lexer.h Regenerated header formatting/line directives.
Data/SQLParser/src/parser/bison_parser.y Major grammar update for new syntax + conflict enforcement (%expect 0).
Data/SQLParser/src/parser/bison_parser.h Regenerated parser header updates for new tokens/types.
Data/SQLParser/README.md Updates upstream branch references (master → main).
Data/SQLParser/Makefile Fixes flags var + adds clang newline warning + parser-specific warning suppressions.
Data/SQLParser/benchmark/queries.cpp Formatting-only changes.
Data/SQLParser/benchmark/parser_benchmark.cpp Formatting-only changes.
Data/SQLParser/benchmark/benchmark.cpp Formatting-only changes.
Data/SQLParser/benchmark/benchmark_utils.h Formatting-only changes.
Data/SQLParser/benchmark/benchmark_utils.cpp Formatting-only changes.
Data/SQLite/testsuite/src/UtilityTest.h Adds SQLite testsuite coverage for Poco::Data::Utility.
Data/SQLite/testsuite/src/UtilityTest.cpp Implements Utility tests (rendering, binding, execution, JSON).
Data/SQLite/testsuite/src/SQLiteTestSuite.cpp Registers UtilityTest in SQLite testsuite.
Data/SQLite/testsuite/Makefile Adds UtilityTest to build objects.
.github/workflows/ci.yml Adds dedicated job to build/test the standalone SQLParser upstream suite on Linux.
.github/path-filters.yml Adds sqlparser path filter for targeted CI execution.
Comments suppressed due to low confidence (1)

Data/SQLParser/src/sql/SelectStatement.h:15

  • NullOrdering is declared as an unscoped enum, but the new code uses NullOrdering::Undefined/First/Last (e.g., in the parser and tests). This will not compile unless NullOrdering is a scoped enum (e.g., enum class) or all call sites are updated to use the unscoped enumerators (Undefined, First, Last).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Data/SQLParser/test/thirdparty/microtest/microtest.h Outdated
Comment thread Data/SQLParser/test/sql_tests.cpp
Comment thread Data/SQLParser/src/parser/bison_parser.y
Comment thread Data/SQLParser/src/sql/ImportExportOptions.h
Comment thread Data/SQLParser/src/util/sqlhelper.cpp Dismissed
Comment thread Data/include/Poco/Data/Utility.h Fixed
Comment thread Data/include/Poco/Data/Utility.h Fixed
Comment thread Data/include/Poco/Data/Utility.h Fixed
Comment thread Data/include/Poco/Data/Utility.h Fixed
…stics #5371

RenderingBinder is an AbstractBinder subclass that captures bind()
traffic as SQL-literal strings instead of forwarding values to a
database driver. It overrides every scalar and container bind()
overload AbstractBinder exposes.

Utility::boundSQL now drives its bindings through a RenderingBinder
and reads back captures per (position, row) to produce the diagnostic
SQL. Three improvements over the previous renderValue-only path:

- STL containers (std::vector/deque/list of any scalar) now render
  with actual element values instead of '?'. The binder captures up
  to maxRows entries per binding; the default boundSQL renders only
  the first row and appends '-- (+N more rows)' if the container
  has more. boundSQLBulk(sql, maxRows, ...) lets callers ask for K
  rows joined by ';\n'.

- User-defined types with a custom TypeHandler<T> specialization
  render correctly out of the box: their bind() method calls back
  into pBinder->bind(pos+i, member, dir) for each column, and the
  RenderingBinder's scalar overrides capture each member's value.
  No user opt-in needed.

- The renderer's contract for OUT and IN_OUT directions is explicit:
  IN and IN_OUT capture the value being sent; OUT captures the
  literal '?' so the rendered SQL stays valid-shaped.

Zero overhead on the regular Statement.execute() path - the
RenderingBinder is constructed only when Utility::boundSQL or
Utility::executeSQL is called; it never replaces the database
driver's binder.

Utility's formatter primitives (quoteString, formatDate, formatTime,
formatDateTime, formatLocalDateTime, formatVar, formatBLOB,
formatCLOB) are now public statics so RenderingBinder reuses them
directly.

Three new UtilityTest cases:
- testBoundSQLVectorScalar - vector<int>{1,2,3} renders first row
  plus '(+2 more rows)' suffix.
- testBoundSQLVectorMaxRows3 - same vector at maxRows=3, renders
  three statements joined.
- testBoundSQLCustomTypeHandler - locally-defined TypeHandler<UPair>
  renders both columns with actual values.

The existing 22 cases keep passing under both the make build and
both cmake matrices (parser-enabled and POCO_DATA_NO_SQL_PARSER).
testExecuteSQLMalformed's expected captured.sql is now conditional
on POCO_DATA_NO_SQL_PARSER (parser path validates; scanner path
substitutes regardless of SQL validity).
@matejk
Copy link
Copy Markdown
Contributor

matejk commented May 27, 2026

Shall SQLParser be moved to dependencies as well? I was not aware that this is not original Poco work.

@aleks-f
Copy link
Copy Markdown
Member Author

aleks-f commented May 27, 2026

Shall SQLParser be moved to dependencies as well? I was not aware that this is not original Poco work.

I am not sure yet. Given the upstream scarce activity and non-responsiveness (it seems that all you can get is likes), we may take over our fork and break with upstream (we have lots of adaptive code already anyway). I gave it one last attempt to keep the sync and if there's no response, we'll go our own merry way.

aleks-f added 4 commits May 27, 2026 16:00
…:renderValue #5371

Eliminates the parallel fmt() overload ladder that lived in
RenderingBinder.cpp. RenderingBinder::recordScalar and recordContainer
now call Utility::renderValue<T> directly, making renderValue the
single source of truth for value -> SQL-literal conversion.

renderValue's if-constexpr ladder gains the binder-ABI scalars that
were previously missing:

- NullData -> NULL (matches AbstractBinder's NullData overload).
- char -> single-character quoted string literal (was previously
  catching as integral and rendering as a number; SQL semantics
  treat char as TEXT(1), not a numeric scalar).
- Poco::UUID -> quoted toString() output.
- Poco::UTF16String -> quoted UTF-8 text after UnicodeConverter::convert.

The char branch is placed before std::is_integral_v<T> so it
short-circuits; Int8/UInt8 remain numeric (they are signed/unsigned
char, distinct from char in the C++ type system).

Container capture uses an explicit value-type copy when iterating so
std::vector<bool>'s proxy iterators decay to bool before reaching
renderValue's overload set.

Verified across make + cmake (parser-enabled) + cmake
(POCO_DATA_NO_SQL_PARSER=ON): all 25 UtilityTest cases pass in every
matrix, including the three new container/TypeHandler cases.
…ndering #5371

DataException carries the raw template SQL (placeholders, no values). When
a Statement throws, operators see "INSERT INTO t VALUES(?, ?, ?)" rather
than the actual bound payload. Adding rendering inside DataException would
change default log shape and silently leak PII into every existing logger.
Instead, expose an opt-in helper callers invoke from a catch block:

  try { stmt.execute(); }
  catch (Poco::Data::DataException& e) {
      log("failed: " + Utility::boundSQL(stmt) + " - " + e.displayText());
  }

The helper walks the Statement's bindings, swaps each one's binder to a
RenderingBinder, drives bind() row by row up to maxRows (default 1), then
restores the original binder. Two reset() calls bracket the bind loop so
the helper works whether the Statement was previously executed or not, and
leaves bindings ready for a subsequent execute() / boundSQL() call.

To make the cascade safe, RenderingBinder::reset() is now a no-op: scalar
Binding<T>::reset() calls pBinder->reset(), which would otherwise wipe
captures from earlier walked bindings. The binder is scoped to one render
call and discarded, so there is nothing to clean up.

recordScalar() now appends to the per-position slot instead of overwriting,
so container bindings driven row-by-row accumulate per-row captures. For
scalar bindings (one bind() per pos) the slot still ends up with exactly
one entry, so the existing boundSQL<Args...> template path is unaffected.

Adds Statement::bindings() public accessors (impl() is protected) and
RenderingBinder::setTotalRows() so the helper can communicate the
binding-derived row count to the suffix logic in boundSQLImpl.

Five new UtilityTest cases cover scalar / vector / bulk / custom
TypeHandler / preserves-execute paths; all 30 cases pass in make,
cmake parser-enabled, and cmake POCO_DATA_NO_SQL_PARSER=ON matrices.
ColumnConstraints owns two heap-allocated members (constraints and
references, the latter containing heap-allocated ReferencesSpecification
pointers). On the happy path those pointers transfer into a
ColumnDefinition which deletes them in its destructor; on parse error
bison reaches the wildcard %destructor that only frees the wrapper,
leaking the inner members.

Add a dedicated %destructor for <column_constraints_t> that releases
the inner state before deleting the wrapper. valgrind on
queries-good.sql + queries-bad.sql now reports 15,209 allocs /
15,209 frees, no leaks.

Also correct a typo in microtest.h (inherited from upstream
hyrise/sql-parser): ASSERT_STRNEQ contained '!= = 0' which would fail
to compile if expanded. No callers use the macro so it was dormant,
but it is still broken on its face; change the condition to '== 0'
(the inverse of ASSERT_STREQ).

Backported to ~/sql-parser fork as fba2a75 / e2f2d23 on the
263-placeholders branch.
…5371

CodeQL's cpp/unused-static-variable and cpp/unused-local-variable rules
flag the parameter pack 'bindings' on Utility::boundSQL and
Utility::executeSQL as unused. The pack IS used via expansion (bindings...
forwarded to boundSQLBulk in one case, into renderings/stmt-binding in
the other), but CodeQL's data-flow doesn't track pack expansion through
call arguments. Annotate the parameter packs with [[maybe_unused]] to
suppress the false positive without changing functional behaviour.
Comment thread Data/include/Poco/Data/Utility.h Dismissed
Comment thread Data/include/Poco/Data/Utility.h Dismissed
Comment thread Data/include/Poco/Data/Utility.h Fixed
Comment thread Data/include/Poco/Data/Utility.h Fixed
Comment thread Data/include/Poco/Data/Utility.h Fixed
Comment thread Data/include/Poco/Data/Utility.h Dismissed
Comment thread Data/include/Poco/Data/Utility.h Dismissed
aleks-f added 2 commits May 27, 2026 18:41
bison_parser.cpp #includes flex_lexer.h, which is generated alongside
flex_lexer.cpp from flex_lexer.l. Under parallel make (-j$(nproc)), the
compilation of bison_parser.o could start before flex regenerated
flex_lexer.h, producing 'hsql_lex was not declared' errors in CI.

Add flex_lexer.cpp as an explicit prerequisite of bison_parser.o so make
serializes correctly. The reverse dep (flex_lexer.o depends on
bison_parser.cpp) was already wired up; this completes the pair.

Backported to ~/sql-parser fork as 0e5e4c6 on the 263-placeholders branch.
Follow-up to ec57eb4. CodeQL also flags 'bindings' and 'pos' inside
boundSQLBulk: both are used in the fold-expression body
(TypeHandler<Args>::bind(pos, bindings, ...), pos += ...), but
CodeQL's data-flow doesn't trace through pack expansion or fold
expressions. Annotate both with [[maybe_unused]] for consistency with
the other variadic wrappers and to suppress the false positives.
Utility::UpdateCallbackType declared the rowid parameter as Poco::Int64,
which is long on LP64 platforms (Linux x86_64, aarch64). sqlite3_update_hook
expects a callback whose rowid is sqlite3_int64 = long long. The previous
reinterpret_cast in eventHookRegister bridged the type mismatch, but calling
a function through a pointer of a different type than its actual signature
is undefined behavior in C++. UBSan (-fsanitize=function) flags it on every
update-hook invocation.

Align UpdateCallbackType and Notifier::sqliteUpdateCallbackFn to use long long
directly, matching sqlite3_int64 exactly. Drop the reinterpret_cast in
Utility::eventHookRegister. Update the test's matching callback signature.

Source-compat: existing callbacks declared with Poco::Int64 will need a
one-line signature swap to long long on Linux (Windows users are unaffected
because Poco::Int64 is already long long there). The change is value-
preserving on every supported platform.
Copy link
Copy Markdown
Contributor

@matejk matejk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design / code-quality review for the Poco-authored surface (Utility, RenderingBinder, Statement::bindings()) and the vendored design choices the Poco fork now owns. Auto-generated bison_parser.cpp / flex_lexer.cpp not reviewed line by line.

Leaving 8 inline comments split P1 (worth before merge) and P2 (design polish):

  • P1: RAII guard for the binder-swap loop; thread-safety docstring.
  • P2: API ergonomics on executeSQL(context), NullOrdering -> enum class, RAII for ColumnConstraints, doc fixes on tsMicros and formatBLOB dialect, possible missing vector<unsigned long> override.

P3 items (TSAN CI job, scanner-fallback hardening, fork-extension tags in Expr.h) intentionally not commented inline -- happy to file as follow-up issues if useful.

Comment thread Data/src/Utility.cpp
Comment thread Data/include/Poco/Data/Utility.h
Comment thread Data/include/Poco/Data/Utility.h Outdated
Comment thread Data/include/Poco/Data/Utility.h
Comment thread Data/include/Poco/Data/Utility.h
Comment thread Data/include/Poco/Data/RenderingBinder.h
Comment thread Data/SQLParser/src/sql/SelectStatement.h
Comment thread Data/SQLParser/src/sql/CreateStatement.h
aleks-f added 3 commits May 28, 2026 23:36
…ence #5373

MemoryDB is an in-memory SQLite database with sharded persistence to disk.
Behaves like a Poco::Data::Session (operator<<, into, use, now); persists
incrementally to a directory of shard files via a background flush timer,
at destruction, and on demand via flush(). Live data stays in RAM in the
active shard; sealed shards stay on disk and are attached read-only on
demand via attachArchived() / historyView().

Features:
- sharded persistence with size and age seal triggers
- retention policies: Options::retentionMaxAge and retentionMaxBytes
- always-on SQLITE_LIMIT_ATTACHED backstop on total shard count (oldest
  sealed shards auto-dropped at flush time when the cap would be exceeded;
  a warning is logged via the Poco.Data.SQLite.MemoryDB logger)
- history queries spanning the active shard plus sealed shards on disk,
  via attachArchived/detachArchived or full / time-range historyView()
- WITHOUT ROWID tables rejected up front (sharding partitions by rowid)
- thread-safe: attach/detach/historyView serialize internally via
  sqlite3_db_mutex so user statements on the shared Session never see a
  half-attached connection; attach bookkeeping is reference-counted so
  concurrent attachers of the same shard pair correctly
- crash-safe: shard files written via temp + atomic rename, manifest
  commits in a single transaction; worst-case loss bounded by the
  idle/max-flush window

Includes:
- Data/SQLite/{include,src}/MemoryDB.{h,cpp}
- Data/SQLite/testsuite/src/MemoryDBTest.{h,cpp}: ~30 tests covering
  persistence, retention, history views, deleteShard semantics, the
  always-on shard-ceiling auto-drop, and a 5-second concurrent stress
  test (1 writer + N readers/historians/admin) that completes clean
  under release and ThreadSanitizer
- Data/samples/MemoryDB: usage sample
- Wiring: SQLiteTestSuite registers MemoryDBTest (and the previously
  landed SQLiteThreadSafetyTest); testsuite Makefile/CMakeLists pull in
  MemoryDBTest under the same SQL parser guard as MemoryDB itself

Supporting Data layer changes folded in (originally #5371):
- Utility::executeSQL takes std::string context instead of
  std::unique_ptr<std::string>; empty string means none. Removes one heap
  allocation per call and reads better at call sites.
- Utility::boundSQL(stmt) wraps the per-binding binder swap in a RAII
  guard so an exception in a user TypeHandler<T>::bind cannot leave a
  binding pointing at a destroyed RenderingBinder.
- Utility.h doc updates: explicit not-thread-safe-with-respect-to-the-
  Statement contract on boundSQL(stmt) / boundSQLBulk(stmt); dialect
  caveat on the formatBLOB X'AABB' hex literal form (SQLite/MySQL/
  standard-SQL, not portable to PostgreSQL bytea or MS SQL 0xAABB).
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 62 out of 81 changed files in this pull request and generated 3 comments.

Comment thread Data/SQLParser/src/SQLParser.cpp Outdated
Comment thread Data/include/Poco/Data/Utility.h
Comment thread Data/SQLite/testsuite/src/SQLiteThreadSafetyTest.cpp
aleks-f added 8 commits May 29, 2026 00:00
…oryDB #5373

Data/SQLite/CMakeLists.txt now adds Data/SQLParser{/src,} to DataSQLite's
PRIVATE include path when POCO_DATA_NO_SQL_PARSER is not defined,
mirroring Data/CMakeLists.txt's existing handling for the Data library.
Without it, MemoryDB.cpp's #include SQLParser.h failed to resolve under
cmake (the gnu-make build had this covered already via
Data/SQLite/Makefile's target_includes).

Also adds Data/SQLite/src/ to DataSQLite-testrunner's PRIVATE include path
so MemoryDBTest.cpp's #include <sqlite3.h> (used by
testShardCeilingAutoDrop to lower SQLITE_LIMIT_ATTACHED at runtime)
resolves via the symlink the bundled SQLite build drops there. The
testsuite Makefile already does this; this just brings cmake to parity.
…5370

Two leaks in SQLParser::tokenize():

1. SQL_NAMED_PARAM (added in #263) strdup()s the parameter name into
   yylval.sval at flex_lexer.l:247, but the cleanup branch only checked
   SQL_IDENTIFIER and SQL_STRING -- every :name placeholder leaked.

2. The loop lexed a token, pushed it, lexed the next, then checked the
   next token's type and freed yylval.sval. Because the second lex
   overwrote yylval, the first sval-bearing token's pointer was lost.
   What got freed was always the next token's sval -- correct in steady
   state but a guaranteed leak on the first sval-bearing token of every
   tokenize() call.

Restructure: lex once per iteration, free immediately if the token type
allocates an sval, push, loop. The token set mirrors bison's own
 at bison_parser.y:194, so the
direct-lex (tokenize) path and the parse path now clean up the same set.
…5371

Utility::renderValue<T> falls through to the
std::is_convertible_v<const T&, std::string_view> branch for T = const char*
or char*, then calls std::string_view(v). When v is null this hits the
string_view(const char*) constructor's contract violation (traits::length
on nullptr is UB). The nullptr-literal branch at the top of renderValue
only matches T = std::nullptr_t and so doesn't catch a pointer variable
that happens to be null.

Fold an is_pointer_v compile-time guard into the string_view branch so
character-pointer types null-check at runtime and return NULL, matching
what the bind layer would emit for a null pointer. std::string,
std::string_view, and other non-pointer types implicitly convertible to
string_view stay on the existing fast path (no runtime branch added).

Test: const char* and char* null variants assert NULL in
UtilityTest::testRenderValue, alongside the existing nullptr-literal case.
… dir #5373

Follow-up to b0efe72. That commit added /Data/SQLite/src/
(gnu-make) and /Data/SQLite/src/ (cmake) to the
testrunner include path so MemoryDBTest.cpp's #include <sqlite3.h>
would resolve. The Linux gnu-make build worked because Data/SQLite's
own Makefile runs an  prebuild that drops a sqlite3.h symlink there. The Windows cmake
build doesn't run that prebuild - sqlite3.h was never present at that
path, and ninja failed with:

    fatal error C1083: Cannot open include file: 'sqlite3.h':
    No such file or directory

Point at the canonical bundled SQLite header location instead:

  cmake:    PRIVATE $<TARGET_PROPERTY:SQLite::SQLite3,
                                       INTERFACE_INCLUDE_DIRECTORIES>
            (resolves to dependencies/sqlite3/src for bundled, or the
            find_package(SQLite3) result for POCO_SQLITE_UNBUNDLED;
            takes only the include dirs from the SQLite::SQLite3
            target, no link, so PocoDataSQLite's already-linked SQLite
            object library isn't pulled in a second time)

  gnu-make: /dependencies/sqlite3/src
            (always present in the source checkout; the system header
            on the default include path takes over for unbundled)

Both work on Linux and Windows, bundled or unbundled, regardless of
whether Data/SQLite's prebuild has ever run.
…tatic #5373

Follow-up to the previous testsuite include-path fix. MemoryDBTest's
testShardCeilingAutoDrop and testHistoryViewTimeRangeOverflow call
sqlite3_limit() directly to constrain the runtime SQLITE_LIMIT_ATTACHED
cap. The include resolution worked once we routed the test's <sqlite3.h>
through dependencies/sqlite3/src, but Windows clang-cmake then failed
at link:

    lld-link: error: undefined symbol: sqlite3_limit
    >>> referenced by ... MemoryDBTest.cpp.obj

The bundled SQLite is compiled into PocoDataSQLite.lib via an OBJECT
library (BUILD_LOCAL_INTERFACE PRIVATE). On Linux .so the symbols are
exported by default; on Windows static (.lib with POCO_STATIC) nothing
without __declspec(dllexport) is visible to downstream linkers, and
SQLITE_API doesn't emit one. Linking SQLite::SQLite3 directly into the
testrunner would re-pull the OBJECT library's .o files and clash with
PocoDataSQLite's already-linked copies.

Wrap it in POCO: Utility::setAttachedLimit(Session&, int newVal). The
wrapper carries the SQLite_API export through PocoDataSQLite's normal
public surface, so the testrunner (and any external caller) gets a
portable way to set SQLite's per-connection attached-DB cap without
needing the raw handle, <sqlite3.h>, or a direct link against the
bundled SQLite object library.

Tests now call Utility::setAttachedLimit instead of sqlite3_limit.
MemoryDBTest.cpp drops its #include <sqlite3.h>, and the include-path
fixes from the previous commit (testsuite CMakeLists pointing at
SQLite::SQLite3's INTERFACE_INCLUDE_DIRECTORIES, testsuite Makefile
pointing at dependencies/sqlite3/src) are reverted - no longer needed.
Audit-driven fixes to MemoryDB:

- _columnCopyCache populate-after-DDL race: a writer racing onDDL could
  reinstall the pre-DDL ColumnCopy and silently drop newly-added columns
  on the next flush. Snapshot _schemaVersion before releasing the lock
  and only install if unchanged.
- doFlush could miss a CREATE TABLE that landed in the sample-to-lock
  window: the new table was absent from the slice plan, its row stayed
  in main but never reached disk, and _dirty got cleared. Re-sample
  userTables after the IO loop and set _dirty if any appeared. The dtor
  now unconditionally flushes with allowSeal=false to catch this case
  without producing a fresh shard on shutdown.
- onDDL incremented _schemaVersion before _schemaLog.push_back; bad_alloc
  on vector growth desynchronized the two and the next catalog write
  committed version N+1 with N log entries. Swap the order so the only
  throwing op runs first.
- _sealRequested arriving after maybeSeal but before the post-IO _dirty
  recompute would be lost until the next user write. Fold it into the
  recompute.
- traceCallback rejected only on sql[0]==-; this matched the SQLite
  trigger-subprogram marker but also any user statement starting with a
  -- comment, letting a WITHOUT ROWID CREATE through via the session()
  bypass. Skip -- and slash-star comment prefixes before classification.
- WITHOUT ROWID rejection could be lost if the _rejected string
  assignment threw bad_alloc. Set a noexcept atomic _poisoned first;
  throwIfRejected uses an atomic fast path and a static fallback reason.
- detachArchived held _attachMutex across its 1ms retry sleep, blocking
  deleteShard (which holds _flushMutex) and transitively stalling flush.
  Release and re-acquire each iteration; the refcount is re-checked so
  concurrent attaches stay correct.
- deleteShard is now idempotent on a vanished id (returns silently
  instead of throwing) so retention loops over a stale shards() snapshot
  stay correct under concurrency.

Header docs updated to capture the three-mutex layering rationale and
the new contracts. Redundant inner _stateMutex acquisitions on _attached
removed since _attachMutex now solely covers it. New lastFlushTime()
accessor for time-bounded durability waits. testDeleteNonexistentShardRefused
updated for the new no-throw contract.
@aleks-f aleks-f linked an issue May 29, 2026 that may be closed by this pull request
aleks-f added 6 commits May 29, 2026 10:20
Binder.h and Extractor.h were the last two public headers including
<sqlite3.h>. Each did it only so a small inline template body
(bindLOB<T>, extractLOB<T>) could call sqlite3 free functions; the
other public SQLite headers had already moved to opaque forward
declarations.

Consumers of Poco::Data::SQLite therefore needed an sqlite3.h on
their include path, and any mismatch against the sqlite3.h POCO was
built with (system vs bundled, different SQLITE_VERSION, custom
SQLITE_API decoration) produced declaration/ODR errors at compile
or link time. On Windows static builds this was the same family of
conflicts as #5373.

Fix: forward-declare sqlite3_stmt as an opaque struct in both
headers and route the three sqlite3 calls through non-template
static helpers (Binder::bindBlobStatic, Extractor::columnBytes,
Extractor::columnBlobPtr) defined in the .cpp files. Template
bodies stay inline and the public API is unchanged; the include
graph just loses one level of unintended exposure.
Adds tests that lock down the parser-layer behaviour POCO's production
code relies on, plus reproductions of recent fix motivations so any
upstream sql-parser sync that regresses them fails loudly in the test
layer rather than silently in MemoryDB / Utility.

Data/testsuite/src/SQLParserTest:
- testResetClearsParameters: reparse loop with reset() between
  iterations - the lifecycle the upstream sval-leak fix hardened.
- testNamedParameter: :name placeholders parse to kExprParameterNamed
  with names {user, age}.
- testAlterDropColumnIfExists: down-casts to AlterStatement +
  DropColumnAction and asserts columnName, ifExists, table name.
- testDropDiscrimination: DROP TABLE vs DROP INDEX dispatch.
- testDeleteShape: DeleteStatement::expr nullptr distinguishes truncate
  from filtered delete - what MemoryDB branches on for the truncate
  optimization.
- testComments: -- line comments survive parsing across statements.
- testTokenize: backstops the SQLParser::tokenize loop the upstream
  sval-leak fix rewrote.
- Inline fix to the pre-existing testSQLParser failure formatter: L13
  was rendering with no separator between line and column; replaced
  with L<line>:C<col> and switched bare fail() to failmsg() so the
  formatted message actually surfaces on the CppUnit report.

Data/SQLite/testsuite/src/UtilityTest:
- testBoundSQLNamed: pins the current pass-through behaviour of :name
  placeholders in Utility::boundSQL (Utility.cpp kExprParameterNamed
  branch sets paramIndex=npos so the span is copied verbatim and the
  supplied args are silently ignored). If this branch is ever taught
  to substitute values, the test must be rewritten.

Data/SQLite/testsuite/src/MemoryDBTest:
- testCommentPrefixedWithoutRowidRejected: covers both operator<< and
  session() bypass paths with a leading -- comment. Pins the motivation
  for the leading-comment-skip rewrite in onStatement; the previous
  single-char trigger-marker check let -- prefixed CREATE bypass the
  trace-hook backstop.
- testDropTableClassified: DROP TABLE survives close-reopen via the
  schema log, proving the kStmtDrop + kDropTable dispatch.
- testAlterTableClassified: ALTER TABLE ADD COLUMN survives close-
  reopen. ADD COLUMN is not modelled as an AlterAction by hsql so this
  exercises the first-keyword DDL fallback in onStatement.
The bison/flex regen in 01f7ec2 overwrote a POCO-local hand-patch
in the generated flex_lexer.h that wrapped the unistd.h include in a
Windows guard. MSVC has no unistd.h, so clang-cl Windows builds
fail with:

  flex_lexer.h(470): fatal error: 'unistd.h' file not found

Re-apply the same conditional, with a POCO-LOCAL comment so the next
person who regens the lexer knows it has to be re-applied. flex's
%option set in flex_lexer.l has no equivalent for this - nounistd
would drop the include on every platform but flex's emitted runtime
calls (read/write/close/isatty) need either unistd.h or io.h, so the
conditional has to live in the generated header.
#5377

e23f9a1 patched the same guard in flex_lexer.h but missed the
matching include in flex_lexer.cpp:3184, which also fails to compile
on clang-cl Windows. Apply the identical conditional in the .cpp.
Same caveat as before: lives in the generated file, must be
re-applied after every flex regen.
@aleks-f aleks-f merged commit 7f269f2 into main May 29, 2026
97 of 98 checks passed
@aleks-f aleks-f deleted the 5370-sync-upstream-sql-parser branch May 29, 2026 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

4 participants